NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Biogeographic distribution of five Antarctic cyanobacteria using large-scale k-mer searching with sourmash branchwater

https://doi.org/10.3389/fmicb.2024.1328083

Lumian, Jessica; Sumner, Dawn Y; Grettenberger, Christen L; Jungblut, Anne D; Irber, Luiz; Pierce-Ward, N Tessa; Brown, C Titus (February 2024, Frontiers in Microbiology)

Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the namesPhormidium pseudopriestleyiFRX01,Microcoleussp. MP8IB2.171,Leptolyngbyasp. BulkMat.35,Pseudanabaenaceae cyanobacteriumMP8IB2.15, andLeptolyngbyaceae cyanobacteriumMP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). TheMicrocoleussp. MP8IB2.171 MAG was found in a wide variety of environments, theP. pseudopriestleyiMAG was found in environments with challenging conditions, theLeptolyngbyaceae cyanobacteriumMP9P1.79 MAG was only found in Antarctica, and theLeptolyngbyasp. BulkMat.35 andPseudanabaenaceae cyanobacteriumMP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones.
more » « less
Full Text Available
Streamlining data-intensive biology with workflow systems

https://doi.org/10.1093/gigascience/giaa140

Reiter, Taylor; Brooks†, Phillip T; Irber†, Luiz; Joslin†, Shannon E; Reid†, Charles M; Scott†, Camille; Brown, C Titus; Pierce-Ward, N Tessa (January 2021, GigaScience)

Abstract As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.
more » « less
Full Text Available
Large-scale sequence comparisons with sourmash

https://doi.org/10.12688/f1000research.19675.1

Pierce, N. Tessa; Irber, Luiz; Reiter, Taylor; Brooks, Phillip; Brown, C. Titus (January 2019, F1000Research)

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.
more » « less
Full Text Available
Critical Assessment of Metagenome Interpretation: the second round of challenges

https://doi.org/10.1038/s41592-022-01431-4

Meyer, Fernando; Fritz, Adrian; Deng, Zhi-Luo; Koslicki, David; Lesker, Till Robin; Gurevich, Alexey; Robertson, Gary; Alser, Mohammed; Antipov, Dmitry; Beghini, Francesco; et al (April 2022, Nature Methods)

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
more » « less
Full Text Available

Search for: All records